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Abstract 


This paper introduces a tensor-Krylov method, the tensor-GMRES method, for large 
sparse systems of nonlinear equations. This method is a coupling of tensor model forma- 
tion and solution techniques for nonlinear equations with Krylov subspace projection 
techniques for unsymmetric systems of linear equations. Traditional tensor methods 
for nonlinear equations are based on a quadratic model of the nonlinear function, a 
standard linear model augmented by a simple second order term. These methods are 
shown to be significantly more efficient than standard methods both on nonsingular 
problems and on problems where the Jacobian matrix at the solution is singular. A 
major disadvantage of the traditional tensor methods is that the solution of the tensor 
model requires the factorization of the Jacobian matrix, which may not be suitable for 
problems where the Jacobian matrix is large and has a “bad” sparsity structure for an 
efficient factorization. We overcome this difficulty by forming and solving the tensor 
model using an extension of a Newton-GMRES scheme. Like traditional tensor meth- 
ods, we show that the new tensor method has significant computational advantages over 
the analogous Newton counterpart. Consistent with Krylov subspace based methods, 
the new tensor method does not depend on the factorization of the Jacobian matrix. 
As a matter of fact, the Jacobian matrix is never needed explicitly. 


1 Introduction 

This paper introduces a tensor-Krylov method for solving the nonlinear equations problem 

given F :%l N —> find x. £ $t N such that F(x.) = 0. (1.1) 

Standard methods (such as Newton’s method) widely used in practice for solving (1.1) are 
iterative methods that base each iteration upon a linear model of F at the current point x c , 

M(x c + d) = F{x c ) + J c d, (1-2) 

where d £ %t N and J c £ $ NxN is either the current Jacobian matrix or an approximation 
to it. When the size of J c is moderate or the sparsity structure of J c is favorable, many 
effective ways of solving the Unear model are based on the factorization of J c (for example LU 
factorization). However, for many real world problems (often arisen in numerical solution 
of ODEs and PDEs), J c is often large and has a sparsity structure that does not allow a 
sparse factorization. The density of a full factorization would be so great that the storage 
of such a factorization would be impossible even on the most powerful computer available 
today due to unavoidable massive fill-ins. An attractive alternative to solving the Unear 
model is the use of Krylov methods, such as the GMRES method, which does not require 
the factorization of J c . The distinct advantage of Krylov methods is their minimum storage 
requirement and potential matrix-free implementations. Newton-Uke iteration schemes for 
solving the nonUnear equations problem using Krylov subspace projection methods as an 
inner Unear solver are considered by many authors including Brown and Saad [5, 4], Chan 
and Jackson [6], and Brown and Hindmarsh [3]. Their computational results show that 
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these methods can be quite effective for many classes of problems in the context of systems 
of partial differential equations or ordinary differential equations. 

The distinguishing feature of the Newton equation based algorithm is that if F\x c ) 
is Lipschitz continuous in a neighborhood containing the root x*, F f (x+) is nonsingular 
and (1.2) is solved exactly (or to certain accuracy, e.g. see [4] for more details), then 
the sequence of iterates produced converges locally and q-quadratically to x*. This means 
eventual fast convergence in practice. However, Newton’s (or Newton-like) method is not 
usually quickly locally convergent, if F'(x*) is singular. This situation is analyzed and 
acceleration techniques are suggested by many authors, including Reddien [23], Decker and 
Kelley [8], [9], [10], Decker, Keller and Kelley [7], Kelley and Suresh [17], Griewank and 
Osborne [15], and Griewank [14]. In summary, their papers show that when the Jacobian 
at the solution has a null space of dimension one, then from good starting points, Newton’s 
method is locally Q-linearly convergent with constant converging to The acceleration 
techniques presented in these papers depend upon a priori knowledge that the problem is 
singular. 

Tensor methods for nonlinear equations introduced by Schnabel and Frank [26] are 
intended to be efficient both for nonsingular problems and for problems with low rank 
deficiency. These methods augment the standard linear model by a low rank second order 
term, in a way that requires no additional function or Jacobian evaluations per iteration, 
and hardly more arithmetic per iteration or total storage, than Newton’s method. The 
second order term supplies higher order information in recent step directions; when the 
Jacobian is (nearly) singular, this usually results in supplying information in directions 
where the Jacobian lacks information or correspondingly, where the second order terms 
have the greatest influence. Tensor methods are shown to be considerably more efficient 
and robust than standard methods on both singular and nonsingular systems of nonlinear 
equations, with a larger margin of advantage on singular problems. As a matter of fact, 
Feng, Frank and Schnabel [13] show that on an appreciable class of singular problems, 
tensor model based methods exhibit multi-step (2-step or 3-step) q-superline convergence, 
whereas Newton’s method has only linear convergence. 

The traditional tensor methods are based on the factorization of the Jacobian matrix 
at each iteration, which makes them unsuitable for large systems of nonlinear equations 
where the factorization of the Jacobian matrix is too expensive. The goal of this paper is 
to develop a tensor-like iterative scheme for solving systems of nonlinear equations using 
Krylov subspace projection techniques. We will refer this method as the tensor-GMRES 
method. This method is independent of Jacobian factorization and can have Jacobian-free 
implementations. In addition, this method is intended to inherit the advantage of tradi- 
tional tensor methods over the standard Newton’s method both on singular and nonsingular 
problems. 

Tensor-Krylov methods were first considered by Bouaricha in his Ph.D. thesis [2]. The 
basic idea is to solve the tensor model by calling a Krylov method for linear equations 
twice in each tensor iteration. Although the second call of the Krylov method might be 
less expensive (due to possible good initial guess) close to the solution, the computational 
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cost of one iteration of tensor methods based on this idea is likely twice as expensive as 
one iteration of analogous Newton-Krylov methods when away from the solution. This 
difficulty could make these tensor methods not competitive with its Newton counterpart in 
many situations. 

This paper gives the tensor-GMRES method that requires no more function and deriva- 
tive evaluations, and hardly more storage or arithmetic per iteration, than the analogous 
N ewton- G M RES method. This is achieved by asking the tensor term in the tensor model 
to have a more restricted form than that in the traditional tensor model. The restriction 
imposed will have minimal impact on the performance of the tensor method, which is partic- 
ularly true for problems where the Jacobian matrix is rank deficient or ill-conditioned at the 
solution. We discuss the formation and solution of the tensor model, and present our com- 
putational results. Like many Newton-Krylov algorithms, we show that the tensor-GMRES 
method introduced here can have efficient matrix free implementations. 

We should point out that the basic idea of this paper can be used as a guidance for 
devising related tensor-Krylov methods such as tensor-Arnoldi, tensor-QMR and tensor- 
BiCG (these are under consideration by the authors). Due to nontrivial technical differences 
between these Krylov subspace based linear system solvers, we do not attempt to give a 
unified treatment of these tensor-Krylov methods, instead, we only concentrate on the 
tensor-GMRES method in this paper. 

We would like to introduce some notation that will be used later on in this paper. We 
denote the solution to the system by x„, and a current iterate by x c or x k throughout this 
paper. Consistent with tradition, we denote F'(x) by J(x ) and usually abbreviate J{x c ), 
J[x m ) as J c , J* respectively. Similarly, we often abbreviate F(x c ), F(x*), F {x c ), and 
F"(x*) as F c , F„ , F" , and F" respectively. The notation || • || denotes the Euclidean vector 
norm. We use N to denote the length of x, which is the number of variables (equations 
also) in the system. 

This paper is organized as follows. Section 2 briefly reviews the Arnoldi process, the 
GMRES algorithm and a line search Newton-GMRES algorithm. The traditional tensor 
methods for nonlinear equations are reviewed in Section 3. The main contribution of this 
paper is Section 4 which introduces the formation and solution of the new tensor-GMRES 
model. The implementation of the tensor-GMRES algorithm is given in Section 5. Compar- 
ative test results for our implementation of the tensor-GMRES method versus the analogous 
implementation of the Newton-GMRES method are also reported in this section. Finally, 
in Section 6, we summarize our research and make some brief comments on areas for future 
related research. 

2 Newton-GMRES method for systems of nonlinear equa- 
tions 

The GMRES (Generalized Minimal RESidual) method was introduced by Saad and Schultz 
[24] for solving large unsymmetric systems of linear equations. This method is very effective 
when coupled with preconditioning techniques. It is also very competitive compared to other 
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iterative methods. Since the GMRES method is a Krylov subspace method, we first give a 
brief review of the Krylov subspace. 

Given a matrix A 6 9t NxN , a vector v\ € %t N and an integer m > 1, the Krylov subspace 
associated with A , v\ and m is defined as 

K m (A,vi) = span{vi,Av 1 ,A 2 vi,- ■ •,A m ~ l vi}. 

Consider a system of linear equations 

Ax = 6. (2.1) 

Given an initial guess x 0 to the solution of the linear system, the initial residual is defined 
as 


r 0 - Ax o - b. 

The GMRES method attempts to find 6 K m (A , r 0 ) such that the residual vector A(x 0 + 
z m ) - b is small, or in other words, x 0 + (approximately) solves (2.1). This is done in a 
fashion that at each iteration the residual norm is minimized. 

The GMRES method is based on the Arnoldi process [1] which uses the Gram-Schmidt 
method to compute an / 2 -orthonormal basis {tq, t> 2 , • • • , u m } of the K m (A,v) as follows. 

Algorithm A: Arnoldi. 

(A-l) Start. Choose a vector v\ such that ||vi|| = 1 
(A-2) Iterate. For j — 1, 2, • • •, do 

= ( A v j , Vi ) , i — 1 , 2 , • • • , j , 

j 

*b'+i = A v j — h UJ v t , 

i= 1 

hj+ij = ||*j+i||, 


As consequences of m iterations of the Arnoldi algorithm (assume it does not break down, 
i.e., ||v j+ i|| does not vanish throughout), we have m + 1 orthonormal vectors Vi, * • * , v m +i, 
and an (m+l)xm Hessenberg matrix H m whose nonzero entries are given by hi j produced 
by the algorithm. Let V m = [t?i, * • *, u m ]. As an important fact, the relation 

AV m — Vm + lflm 


holds after each Arnoldi iteration. 

The GMRES scheme is based on solving the least squares problem: 

min ||/ - A(x 0 + z m )\\ = min \\r 0 - Az m \l (2.2) 

£rr»C-**m z m £ m 
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where r 0 = / - Ax 0 . If we set z m = V m y, m = r 0 /||r 0 || and 0 = ||r 0 ||, this is equivalent to 
solving 

min \\0vi - AVmj/H 
= min \\V m+ \(0e-i - H m y)\\ 

= min \\0e x - H m y\\. (2-3) 

The least squares problem (2.3) is solved via a QR factorization of H m -, which is fairly 
inexpensive because of the Hessenberg form of H m . When m is small the cost of solving 
(2.3) is minimal. Based on above observations, the GMRES algorithm for solving systems 
of linear equations is devised. 

Algorithm G: GMRES. 


(G-l) Start. Choose x 0 and compute r 0 = f - Ax 0 and v\ — ro/ll r o||- 
(G-2) Iterate. For j = 1,2 , • • ■ , m, • • • until satisfied do: 



= (Avj,vi),i = 

Vj + 1 

3 

= Av i ~ 

i = l 

— Il^j+l|l> 


Vj+i 

— fy+i/^j+W- 


(G-3) Form the approximate solution: 

X m = XQ + V m y m where y m minimizes \\0ei - H m y m ||, y 6 & m - 

Due to memory limitation, it is necessary to restrain the number of Arnoldi iterations 
taken in (G-2). This leads to restarted versions of GMRES. The idea is to use the GM- 
RES iteratively by restarting the algorithm every m steps, where m is some fixed integer 
parameter. Using the GMRES as a linear system solver, one can obtain a Newton-GMRES 
algorithm for nonlinear equations. At each iteration of the nonlinear algorithm, a (or an 
approximate) solution to the linear system 

J c d = —F c , (2-4) 

with J c being the current Jacobian matrix and F c being the current function value, must 
be obtained. The Newton-GMRES method is an inexact Newton method, in the sense 
that at each iteration, the Newton-like step is obtained by solving the Newton equation 
approximately instead of “exactly”. The step obtained in this way is required to be a 
descent direction for the function 4||F(x)|| 2 . As a matter of fact, when the Newton equation 
is solved accurately enough (see [4] for details), the step obtained always gives a descent 
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direction for ^||F(x)|| 2 . Afterwards, a global convergence strategy such as backtracking line 
search is employed to determine the step length along this descent direction, which will 
force progress towards the solution. 

Algorithm NG: An iteration of the Newton-GMRES. 

Given x k , J k £ and F ' k G 

(NG-1) Choose e k £ [0, 1). 

(NG-2) Do GMRES (restart if necessary) to find d n = d 0 + V m y n such that 

Fk + Jkd" = r k , with ||r fc ||/||F fc || < e k , 

where do is the initial guess to the solution of the Newton equation, and the 
columns of V m form an orthonormal basis for the Krylov space generated by the 
Arnoldi process. 

(NG-3) Find A > 0 using a backtracking line search global strategy and form the next 
iterate x k +\ = x k + A d n . 

The residual vector r k is the amount by which d n fails to satisfy the Newton equation 
J k d + F k = 0. The forcing sequence e k is used to control the level of accuracy. The global 
and local convergence of inexact Newton methods is analyzed by Brown and Saad [4]. Their 
theory implies that if the sequence e k —+ 0, then under conditions (such as the Jacobian 
matrix is nonsingular at the solution) the iterates generated by Algorithm NG converges 
to the solution superlinearly; the convergence is quadratic if e k = 0(||Fjt||). This means 
eventual fast convergence in practice for nonsingular problems. 

An attractive feature of Newton-Krylov algorithms is that the explicit computation of 
the Jacobian matrix is never needed. This is owing to the fact that the only computation 
involving the Jacobian matrix is the product of the Jacobian matrix and a vector, which 
can be approximated by finite difference 

rV F ( x + av ) - F ( x ) 

X)V — 

a 

with a small number a (see e.g. [11] for details). 

3 Traditional tensor methods for nonlinear equations 

The tensor model introduced by Schnabel and Frank [26] is a quadratic model of F(x) 
formed by adding a second order term to a Unear model, giving 

M t (x c + d) = F{x e ) + J c d + ^T c dd, (3.1) 

where T c £ SfcNxNxN j s intended to supply second order information about F(x) around 
x c . The second derivative of F(x) at x c , F"(x c ) € ft NxN * N is an obvious choice for T c in 


(2.5) 
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(3.1). However this choice for T c has several serious disadvantages that preclude its use in 
practice. These include the computation of N 3 / 2 second partial derivatives of F(x ) and a 
storage requirement of at least N 3 / 2 real numbers for F"(x c ). Furthermore, to utilize the 
model (3.1) with T c = F"(x c ), at each iteration one would have to solve a system of N 
quadratic equations in N unknowns, which is expensive and might not have a root. 

The difficulties associated with the use of T c = F"(x) in (3.1) are overcome in tensor 
methods by choosing T c to have a restricted low rank form. This can be considered as an 
extension to second order objects of the low rank update methods used to approximate 
Jacobian or Hessian matrices in secant (quasi- Newton) methods. One difference is that 
for reasons of efficiency in arithmetic cost and storage, at each iteration the zero tensor is 
updated rather than the tensor from the previous iteration. 

Formation of the tensor model in Schnabel and Frank [26] is based upon the interpo- 
lation of information from past iterates, and requires no additional function or derivative 
evaluations. This is done by selecting some set of independent past iterates 
and requiring the model (3.1) to interpolate the function values F(x _ k ) at these points. 
That is, the model is required to satisfy 

F{x- k) = F(x c ) + F'(x c )s k + ^ T c s k s k , k - 1,- • - ,p, 

where s k = X- k - *« k = 1, • • -,p. The directions {s k } are required to be strongly linearly 
independent, which usually results in p being 1 or 2, although an upper bound of p < VN 
is permitted. Then T c is chosen to satisfy 

min X[€R n*nx« ||T c ||f (3.2) 

subject to T c s k s k = z k , k = 1, • • • ,p, 

where ||T c ||f, the Frobenius norm of T c is defined by 

ii^cHf = yiyi y^(r c [*,j, > 

t=l j = 1 k= 1 

and z k e is defined as z k = 2 (F(x_*) - F(x c ) - F'(x c )s k ). The solution of (3.2) is given 

by 

T c = j2a k s k s k , (3-3) 

k = 1 

where a fc is the fcth column of A e ® Nxp , A is defined by A = ZM ~ 1 , M € » pXp is defined 
by M[i,j] = ( sfsj ) 2 , 1 <i,j <P, and Z 6 K 7Vxp by column k of Z = z k , k = 1, ■ • -,p. 
Substituting (3.3) into the tensor model (3.1) gives 

V 

M T (x c + d) = F(x c ) + F'(x c )d+iy^a fc (srd) 2 . (3.4) 

fc=i 
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The additional storage required by T c is 2 p 7V-vec tors, for {a*} and In addition, the 2 p 

n- vectors {x_*} and {F(x-k)} must be stored. Thus the total extra storage required for the 
tensor model is at most 47V 1 * 5 since p < \/7V, which is likely to be small compared to the TV 2 
storage required for the Jacobian. The entire process for forming T c requires N 2 p + 0(Np 2 ) 
multiplications and additions. The leading term comes from calculating the p matrix-vector 
products F f (x c )sk, k = 1, * • -,p; the cost of solving A = ZM~ l is 0(Np 2 ). Since p < Vn, 
the leading term in the cost of forming the tensor model is at most TV 2 * 5 multiplications and 
additions per iteration, but it is usually only a small multiple of TV 2 arithmetic operations in 
the dense case since p is usually 1 or 2. This cost also is small compared to the at least 7V 3 /3 
multiplications and additions per iteration required for the dense matrix factorizations by 
standard methods that use analytic or finite difference derivatives. If F'(x) is large and 
sparse, the additional cost is simply p matrix-vector products, which generally is small in 
comparison to the costs of a standard iteration. 

Solution of the tensor model with the special form of T c given by (3.3) also can be 
performed efficiently in terms of algorithmic operations. The goal is to find a root of the 
tensor model (3.4), that is, 

find d G %l N such that 

p 

M t (x c + d)= F(x e ) + F'(x c )d + \ Y, ak(sld) 2 = 0. (3.5) 

k-l 

Since (3.5) may not have a root, it is generalized to solving 

min \\Mt{x c + d)\\ 2 . (3.6) 

de# N 

Schnabel and Frank [26] show that the solution of (3.6) can be reduced, in 0(N 2 p) 
operations, to the least squares solution of a system of q quadratic equations in p unknowns, 
plus the solution of a system of TV — q linear equations and N -p unknowns. (Usually, q — p; 
the exceptional case q > p arises when the system of TV — p linear equations and TV — p 
unknowns would be singular and generally only occurs when rank(F f (x c )) < N — p.) This 
reduction is carried out by performing orthogonal transformations of both the variable and 
equation spaces in a way that isolates the quadratic terms into only p equations. The details 
of this process are not important to this paper, because here we deal solely with models 
where p = 1 , in which case the tensor model can be solved much more simply and in closed 
form. For these reasons we do not discuss the solution algorithm further in this paper; for 
details, see [26], The total cost of solving the tensor model is about 27V 3 /3 + N 2 p + 0(N 2 ) 
multiplications and additions in the dense case, at most N 2 p < TV 2,5 multiplications more 
than the QR factorization of an TV x TV matrix. The process generally is numerically stable 
even if F\x c ) is singular but has rank > TV - p. If F f (x c ) is nonsingular, the Newton step 
can be obtained very cheaply as a by-product of the tensor model solution process. If F f {x c ) 
is large and sparse, the tensor model solution still costs very little more than the standard 
Newton iteration, see [2]. 

In practice, computational results in [26] show the tensor method is more efficient than 
an analogous standard method based upon Newton’s method on both nonsingular and 
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singular problems, with a particularly large advantage on singular problems. In tests on a 
standard set of nonsingular test problems, the tensor method is almost always more efficient 
than the standard method and is never significantly less efficient. The average improvement 
by the tensor method is 21 — 23%, in terms of iterations or function evaluations, on all test 
problems, and 36 - 39% on the harder problems where one method requires at least ten 
iterations. The tensor method is considerably more efficient than the standard method 
on problems with rank(F'(x .)) = N - 1; the average improvement is 40 - 43% on all 
problems and 57 - 61% on the harder problems. The advantage of the tensor method over 
the standard method on problems with rank(F'(x,)) — N - 2 is not as great as for the 
rank( F'(a;,)) = N - 1 case but is still considerable, an average of 27 - 37% improvement on 
all problems and 57-65% on the harder problems. More recent computational experiments 
in [2], including experiments on much larger problems, show similar advantages for tensor 
methods. 

It is shown in [13] that under mild conditions that the sequence of iterates generated 
by the tensor method based upon an ideal tensor model converges locally and two-step 
Q-superlinearly to the solution with Q-order §, and that the sequence of iterates generated 
by the tensor method based upon a practical tensor model converges locally and three-step 
Q-superlinearly to the solution with Q-order f. In the same situation, it is known that 
standard methods converge linearly with constant converging to Hence, tensor methods 
have theoretical advantages over standard methods. The analysis in [13] also confirms that 
tensor methods converge at least quadratically on problems where the Jacobian matrix at 
the root is nonsingular. 

4 Tensor-GMRES method for nonlinear equations 

4.1 Introduction 

As reviewed in the previous section, traditional tensor methods foT nonlinear equations are 
more successful both in theory and in practice than standard methods, notably for their fast 
local convergence and their efficiency in arithmetic cost per iteration. Nevertheless, since 
these methods are based on the factorization of the Jacobian matrix (or its approximation), 
they may not be attractive for many classes of large and sparse systems. Newton-Krylov 
methods are very effective in many situations for nonlinear equations problems. They often 
have fast local convergence as analyzed by Brown and Saad [4]. However, the fast local 
convergence can be impaired when the Jacobian matrix is singular or ill-conditioned at the 
solution. This is not uncommon in practice and accounts for a substantial amounts of the 
failures of Newton-like methods. The tensor method introduced here is primarily intended 
to improve upon the Newton-Krylov methods in cases where the Jacobian matrix is singular 
or ill-conditioned at the solution. This is done in a fashion similar to the Newton-GMRES 
method that avoids the requirement of the factorization of the Jacobian matrix by the 
traditional tensor methods. As a matter of fact the Jacobian matrix is never explicitly 
needed. In addition, the new method will have similar efficiency in arithmetic cost per 
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iteration to Newton-GMRES method. 

The tensor model considered here uses only one past iterate information. There are three 
reasons for this. First, from the computational experience with tensor methods for nonlinear 
equations, the tensor methods that use one past point are easier to implement and have 
more satisfactory computational performance in practice. Second, from theoretical point 
of view, tensor methods based on single past point are better understood. Third and more 
importantly, the tensor model based on using more past points may require significantly 
more storage than the Newton-GMRES method since the latter only requires 0(N)(N being 
the size of the problem) memory locations and the storage of each past iterate information 
requires O(N) locations as well. The situation is much different for the traditional tensor 
model based methods where p, the number of past points used, can be as high as y/N. 

The major difference between the new tensor model and the traditional tensor model 
is that the new tensor model has a more restricted second order term. The analysis of 
tensor methods for nonlinear equations by Feng, Frank and Schnabel [13] indicates that 
tensor methods will not lose fast local convergence on singular problems if the tensor term 
is projected into proper subspaces. As a matter of fact, we can show that for problems 
where the Jacobian matrix has rank deficiency one at the solution, if the second order 
term in the tensor model is projected into the subspace spanned by the left singular vector 
corresponding the smallest singular value of the Jacobian matrix, the theoretical results 
given in [13] remain intact. This means that methods based on the tensor model with the 
projected second order term will have fast local convergence on singular problems. This 
is the theoretical foundation of our tensor-GMRES method. The idea of projected tensor 
methods was first implemented in [12] for constrained optimization, where the projection is 
taken place in the variable space. The difference here is that the projection occurs in the 
function space. 

This section is organized as follows. We first analyze two ideal tensor models that are 
closely related to our new tensor-GMRES model in Subsection 4.2. The formation and 
solution of the tensor-GMRES model is discussed in Subsections 4.3 and 4.4. We give a 
work comparison between the tensor-GMRES method and the analogous Newton-GMRES 
method in Subsection 4.7. 


Some notation is also useful to us. Throughout this section we use N to denote the 
number of unknowns in the Newton equation and NZ to denote the number of nonzero 
elements in the Jacobian matrix. 

Let F\x c ) = U c D c Vj be the singular value decomposition of F f (x) at z c , where U c = 
K, u%, • • • , w^], V c = [vf, »£,•••, v%], and D c = diag[a\, cr%, ■ ■ ■ , a c N ), with a\ > a c 2 > • • • > 
a N > 0 being the singular values of F'(x c ) and . {wf} being the corresponding left and 
right singular vectors. 

Similarly, let F'(x«) = UDV T . Let v and u be the right and left singular vector of 
F'(x*) corresponding to the zero singular value, when F'(x„) has rank deficiency one. 
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4.2 Analysis of tensor methods using projected second order derivatives 

The sequence of iterates produced by our algorithms is invariant to translations in the 
variable space. Thus no generality is lost by making the assumption that the solution 
occurs at a:, = 0, and this assumption is made throughtout this subsection. We also assume 
that vf/ and u c N are so chosen that ||r^ — v|| = 0(||^c||) an d ll u v — M ll = ^(ll x c||)> whenever 
x c is sufficiently close to x„. This assumption is valid from the theorems about continuity 
of eigenvectors in Ortega [20] and Stewart [27], as long as F'(x ) is continuous near x„, and 
has rank deficiency no greater than one at x*. 

Before going into the details of the analysis, we give Assumption 4.0, a group of as- 
sumptions that will be invoked for the remainder of this subsection in every result involving 
F (x). These assumptions basically state that neaT x«, the second order term supplies useful 
information in the null space direction of F'(x*), where F'(x„) lacks information. 

Assumption 4.0 Let F : $t N -* have two Lipschitz continuous derivatives. Let F(x») = 
0, F'(x.) be singular with only one zero singular value, and let u and v be the left and right 
singular vector of F'(x m ) corresponding to the zero singular value. Then we assume 

u T F"(x*)vv ^ 0 (4-1) 


where F"(x.) 6 ft**** 1 *. 

Assumption 4.0 is satisfied by most problems with rank(F'(x+)) = N — 1, and has 
been assumed in most papers that analyze the behavior of Newton’s method on singular 
problems. When N = 1, Assumption 3.0 is equivalent to /"(a;*) ^ 0. 

Suppose we know the right and left singular vectors v c N and u c N corresponding to the 
least singular value of F'(x c ) where x c is the current iterate and ||u^|| = I! = 1* Then 
an excellent tensor model around x c , if one is to utilize just a rank-one second order term, 
is 

Mt un {*c + d) = F(x e ) + F'(x c )d + u c N u c N T )a c (vffd ) 2 , (4.2) 

where a c - F"(x c )v c N v c N , because it contains the correct second order information where 
the Jacobian contains the least information, and correspondingly, where the second order 
term has the greatest influence. 

Based on (4.2), a simple tensor algorithm, Algorithm PT, is designed. 

Algorithm PT: Projected Tensor Algorithm. 

IF (4.2) has real roots THEN 

d <— dR where dp solves Mx ujv (x c + d) = 0 

ELSE d <- dju where d m minimizes \\Mt un (x c + d)|| □ 

Since (4.2) is the basis for our new tensor-GMRES model, we give an analysis of Algo- 
rithm PT. 
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Corollary 4.1 Let Assumption 4-0 hold and {xk} be the sequence of iterates produced by 
Algorithm PT. There exist constants K\, K 2 such that i/||x 0 || < K\, then the sequence {x fc } 
converges to x* and 


for k = 0, 1,2, • • •• 

Proof. Note that I = YliL l u i u i T and J c = Eili From the orthogonality of «?, 

i — 1 , • * * , N , we have 

M TuN (x c + d) 

= (52 u c x uf){F c + 52 ff i u i v f d + \{u c N u c N T )a c {v c N T d ) 2 ) 

t=l 1=1 

= l 52 (v!vfd + ufF c X] 

1=1 

+( <T N v N T d + u% t F c + ^u^^a^u^^d) 2 )^. (4-3) 

Note that the difference between (4.3) and (4.2) of [13] is only a second order term in the 
coefficient of each u x for i — 1, • • • , TV - 1, which does not effect either of the proofs of 
Lemmas 4.1 and 4.2 of [13]. The rest of the proof can be completed by following exactly 
the proof of Theorem 4.4 of [13]. □ 

Now we look at an interesting tensor model that is closely related to (4.2). Let W G 
sfcNxm m < N orthonormal columns. Consider the tensor model 

M Tw (x c + d) = F(x e ) + F'(x e )d + ±(WW T )a c (vff d)\ (4.4) 

where u c N - Wy for y / 0, or u c N is in the span of the column vectors of W. 

Corollary 4.2 Let Assumption 4-0 hold and {x^} be the sequence of iterates produced by 
Algorithm PT with u c N being replaced by W. There exist constants K \ , K 2 such that if 
||xo|| < K x , then the sequence {x^} converges to x* and 

Il*fc+2|| < *2||* fc ||t 


for Ar = 0,1,2,-*-. 

Proof. Since U N = Wy , from the orthogonality of columns of W , we have 

u c N u c N T WW T = u c N y T W T WW T = u c N y T W T = u c N u c N T . (4.5) 
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By similar reasoning as stated in the proof of Corollary 4.1, and using (4.5) 

M Tw (x c + d) 

= (£»f)(F c + + \(WW T )a c {v c N T df) 

1=1 t=l 

= [ E + < Tp ' + i 2 < T (ww T ) ac (v c N T d) 2 K\ 

+(a c N v c N T d + u c n t F c + \u c N T a c {v c N T df)u c N . ( 4 . 6 ) 

Again, note that the difference between (4.6) and (4.2) of [13] is only a second order term 
in the coefficient of each u? for i = 1, • • *, N - 1, which does not effect either of the proofs 
of Lemmas 4.1 and 4.2 of [13]. The rest of the proof can be completed by following exactly 
the proof of Theorem 4.4 of [13]. □ 

4.3 Formation of the tensor model 

At the current iterate x c , assume that the Newton equation 

F c + J c d n = 0 (4-7) 

is solved by the GMRES method with d n = V m y m (assuming starting from zero). Let H m 
be the (m + 1) X m Hessenberg matrix from the Arnoldi process. An interesting fact is that 
the resulting Newton step d n is in the span of the column vectors of V m . The analysis of 
Feng, Frank and Schnabel [13] indicates that when the Jacobian matrix has a null space of 
dimension one at the solution, close to the solution, the Newton iterates fall into a funnel 
around the null space. In this situation, their theory also implies that the angle between d n 
and u£, the right singular vector corresponding to the smallest singular value of the current 
Jacobian matrix J c , will be arbitrarily close to zero, close to the solution. As a consequence 
of d n = Vmym, v n will be arbitrarily close to being in the span of the column vectors of 
V m - Consequently, u% that is in the same direction as J c v c n will be arbitrarily close to being 
in the span of the column vectors of J c V m . Hence a good approximate to the projection 
matrix WW T in (4.4) would be the projection matrix 

p = (J c Vm)[(JcV m ) T (JcV m )}-\JcVm) T . ( 4 - 8 ) 

The singular vectors and the exact second order derivatives used in (4.4) are normally 
too expensive to obtain. We approximate them in the following manner. As in the situation 
of the traditional tensor model (3.4), let s c = x p -x c , the difference between the past iterate 
x p and the current iterate x c . There are two choices for approximating v c N in (4.4), One 
is using d n /||d n || since the Newton step d n is likely to be along the null space close to the 
solution for singular problems. Another one is using h = s c /||s c || since when consecutive 
iterates are in the funnel around the null space near the solution, the difference between the 
two consecutive iterates is also likely to be along the null space. We choose to use h because, 
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as we will see later, this will cause our tensor model to interpolate a past point in a projected 
function space. The ability of interpolating past points is vital for the success of traditional 
tensor methods for nonlinear equations on both singular and nonsingular problems. The 
term a c = F"(x c )v c N v c N in (4.4) can be approximated by 


2 (Fjxp) - Fjxc) ~ J(zcK) 

sjs c 


F"{x c )hh + E, 


(4.9) 


where \\E\\ = 0(||s c ||). Equation (4.9) is standard in tensor model formation, which requires 
no extra function or Jacobian matrix evaluations. 

Putting all the pieces together, we arrive at the following tensor model. 

M Tp {x c + d) = F c + J c d+\Pa{h T df, (4.10) 


where P is given by (4.8). It is easy to verify that the unprojected tensor model 

Mt{x c + d) = F c + J c d + \a(h T d) 2 


(4.11) 


interpolates the function value at the past point x p . Hence, from 


PMt p {x c + d) 


PF C + PJ c d + \PPa{h T d f 
PF C + PJ c d+ \Pa{h T d) 2 
P(F c + J c d + \a(h T d) 2 ), 


the interpolation property of the full tensor model (4.11) implies that the projected tensor 
model (4.10) interpolates F(x) at the past x p in the subspace resulted from the projection of 
P onto the full function space. It is easy to see also that the tensor model (4.10) interpolates 
F(x) at the current point in full function space. A second property of (4.10) is that when 
m = A, The projector matrix P is equal to identity, which recovers the full tensor model 
(4.11). 

The formation of the tensor model (4.10) requires storage of two N- vectors F p and 
x p . The work of calculating a and h requires a matrix- vector multiplication costing NZ 
multiplications, and the evaluation of |j.s c || costing N multiplications. Since as we will see 
later, we do not have to form the projection matrix P explicitly, we will count the cost of 
calculations involving P to the cost of solving the tensor model, which will be addressed in 
the next subsection. 


4.4 Solution of the tensor model 

Solving the tensor model (4.10) in the full variable space is not preferable since it would 
be as expensive as solving the full tensor model. As discussed in Section 1, solving the full 
tensor model could involve calling a Krylov method for linear equations twice, which could 
be expensive in many situations. An alternative is to solve (4.10) in the Krylov subspace. 
When the Jacobian matrix lacks the first order information, close to the solution, the major 
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error of the current iterate would likely reside in the space spanned by v c N . Hence, we would 
like the tensor step to be along v c N so that it would have the biggest effect on reducing the 
error. Since v% is arbitrarily close to being in the span of the columns of V m , it is reasonable 
to require that the tensor step be in the span of the columns of V m . 

Therefore we would like to solve the least squares problem 

min || F c + J c z + \Pa(h T z) 2 \\, ( 4 - 12 ) 

Z$.K. m 

which is equivalent to solving 

min ||F C + J e V m y + \Pa{h T V m y) 2 \\. (4-13) 

yex m 

RecaU that J c V m = V m+l H m . Let H m = Q m Rm be the QR factorization of H m . (Note 
that Q m is the product of m Givens rotations; for details see [24].) From (4.8), 

P = V m+ lQmRm[{V m+ iQmR m ) T (V m+ lQ m R m )] 1 {V m +lQ m Rm) 

— Vm+lQmRm{RmR m) ^mOm^m+ 1 

= < 4 ' 14) 

Using (4.14) and r 0 = -F c , and letting b be the first m components of Q^V^ +1 a, (4.13) is 
equivalent to 

mm || — ro + V m +iH m y + ^V m +iQ m ^ q ^ VmV) II 

- min || V^j. 4-1 ( ll^'oll^l _ OmR-mV — \Qm ( Q ] (^ Vmy) )|! 

V6» m \ / 

= min IIQmtQmlMlei - RmV ~ \ ^ q ^ ( h T V m y ) )|| 

= min ||tt> — R^y — b(h T V m y) 2 \\ (4.15) 

y6X m 

where w is the first m components of Q£|M|ei and R} m is the first m rows of R m . An 
interesting feature of (4.15) is that if the system of quadratics has a root, the minimum norm 
of the tensor model in the Krylov space will be the absolute value of the last component 

of<&o||ei, which is the same as the residual norm of the Newton equation solved by the 

GMRES algorithm. When R^ is nonsingular, using the techniques for solving the tensor 
model developed in [2], we form the (3 function 

,(fl) = h T V m (R l m r'^->> T V m y-^ T V m (Rlr'b(h T V m y) 2 

= h T (S} m )-'w-l3-\h T (R' m )-'bP 2 , ( 416 > 
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(4.17) 


where j3 = h T V m y and h = V^h, and solve the minimization problem 

J&WM- 

Let /?» be a solution to (4.17). By the theory established in [2], (4.15) is solved by 
yt = [(Rl) T (Rl i )]- 1 hq(0*)/u + {RLr'u - 

where u = h T [(#™) T (#m)]~m. Then the tensor step for the system of nonlinear equations 
is given by 

d l = v m yl (4.18) 

Compared to the solution of the Newton equation by the GMRES algorithm, the solution 
of the tensor model requires a minimal amount of extra work. The major extra work comes 
from forming d* from (4.18) and the calculation of V^h and V^a, each requiring mN 
multiplications. In addition, forming the (3 function requires one extra backsolve of an 
m x m triangular system and two dot-products of vectors of length m, costing |m 2 + 
2 m multiplications; forming b needs an applications of m Givens rotations, costing 4m 
multiplications; forming [(^J n ) T (^J n )]“ 1 /i requires two backsolves of an m x m triangular 
system, costing m 2 multiplications, and then forming uj needs a dot-product of two m- 
vectors, costing m multiplications. In summary, the total extra cost of solving the tensor 
model is 3 mN + |m 2 + 7m. 

4.5 Dealing with d 0 ^ 0 

Because of the memory limitation, it is necessary for the GMRES algorithm to restart. As 
a result, the solution to the Newton equation obtained from the GMRES algorithm is given 
by d n = do + V m t/ n . In this situation, d n is in the span of {do,V m } y instead of in the span 
of {Kn} only, as in the situation of do = 0. For reasons discussed before, we would like 
the tensor step to be in the span of {do? km}* Consequently, we would like the projection 
matrix P to be onto the subspace spanned by {J c V m , J c do}. One choice of such a P is 

P = M(M t M)~ 1 M t 

where M = J[V m ,do\. Hence we would like to solve 

. min \\F C + J c d+ \Pa(h T d) 2 \\. (4.19) 

Let d = V m y + d 0 T. Using r 0 = —F c — J c d 0 , (4.19) is equivalent to solving 

y6 nun € J| F c + J c [V m ,d 0 } ( V ) + \Pa{h T [V m ,d 0 ] ( » )} 2 || 

= ^nun e J| F c + [JcV m ,Jcdo] f 0 + \Pa{h T [V m ,d 0 ] ( \ )} 2 || 
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= mm 
yeK 


in^|| F c + [J c Vm,-F c - r 0 ] ^ ® ^ + \Pa{h T [V m , do] ^ ^ ^ } 2 
rnin^||F c + [V m +\H m ,g] ^ ^ ^ + \Pa{h T [V m ,d 0 ] ^ r ^ } II) 


(4.20) 


where 5 = — F c — ro. An important feature of (4.20) is that it does not involve the Jacobian 
matrix J c . We simplify (4.20) to 


min || F c + Jy + \a(s T y) 2 1|, 

r»m-M 


(4.21) 


y& 


where J = [V m+1 H m ,g], y = f J J , a = Pa and s = [V m ,d 0 ] r /i- We discuss how to form 

and solve (4.21) efficiently. The solution of this type of nonlinear least squares problem is 
studied by Schnabel and Bouaricha in [25]. Their theory shows that when J has full rank 
the solution of (4.21) is given by 

y, - (J T J)~ x s q(/3*)/u — (j T J)~ l J T (F c + (4.22) 

where y(/3) and u are defined by 

q{0) = f(j T j)~'j T F c + /3 + \s T (j T jf'j T tiH 2 , (4.23) 

u = sU T j)-'S, (4-24) 

and the value of /?, is determined from 

min || q(fi)/ V^|| 2 + |H/3)|| 2 , ( 4 - 25 ) 

where ||rc(/3)|| 2 = ||(F C - P F c ) + ^( a - Pa)0 2 1| 2 . 

In our situation, since P is a projector matrix and 

a — Pa = Pa — PPa = Pa — Pa = 0, 

n(/3) is a constant function. Hence the minimization problem (4.25) is equivalent to 

min \\q(0)\\. (4.26) 

To obtain y, the critical computational work comes from the factorization of J T J, 
since when this factorization is available, all the computations involving (J T J)~] can be 
achieved through backsolves. For this reason, we discuss the factorization of J J. Recall 
that J — [V^n-)-i H m ,g] and H m = Q m R m . Hence we have 

JT j = H m , g]T[V m +\ H m , g] 
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( tilH m HlV^g \ 
\ g T V m +iH m g T g J 


_ ( RlR m H£vZ +1 g\ 

V g T V m+1 H m g T g ) 



where Rl a is the first m rows of R m , w = Qm^m+ig-i an d 7 = V 9^ 9 ~ wT w. The factor- 
ization is possible since J T J is always at least positive semidefinite. After y * is obtained, 
(4.19) is solved by 


d* - [V mi d 0 ]y*. 

However, the calculation of expressions involving (J T J)” 1 is impossible if 7 = 0. We 
discuss how to overcome this difficulty. Since J = [V m +\H m ,g] and V m +\H m has full rank, 
g has to be in the span of { V m +\H m }, which implies that J c do = g is in the span of 
J c Vm = V m + \Hm- When J c has full rank, this implies that do is in the span of {F m }, which 
in turn implies that d n = do + V m y n is in the span of V m y n . In this situation, based on 
previous discussions, we actually would like to solve the tensor model (4.10) in the Krylov 
subspace V m only, i.e. 


min ||F C + J c (d 0 + z) + \Pa{h T (d 0 + z)) 2 ||, (4.27) 

where P is given by (4.8), which is equivalent to solving 

min ||F C + J c d 0 + J c V m y + \Pa(h T (d 0 + V m y)) 2 \\. (4.28) 


Using J c V m = V m +iH m , H m = Q m R m , r 0 = — F c — J c d 0 and (4.14), and letting b be the 
first m components of Q^V^^a, (4.28) is equivalent to 

min || - r 0 + V m + x H m y + \ V m +iQ m ^ q ^ ( h T (d 0 + V m y)) 2 1| 

= ™ ||V m+ i(||r 0 ||ei - Q m R m y - \Q m ^ J j (h T (d 0 + V m y)f) || 

= ||Q m (0mll r o|l e i - RmV ~ \ ( J j (h T (d 0 + U m 2/)) 2 )|| 

= min \\w - R^y - b(h T (d 0 + V m y)) 2 || (4.29) 

where w is the first m components of Q^|| r o||^i and R ^ is the first m rows of R m . Again 
using the techniques for solving the tensor model of nonlinear least squares developed in 
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(4.30) 


[2], we form the (3 function 

q(j3) = ^VrniRl.^W-^Vrny-l^VraiRl.^bi^ido + Vn.y)) 2 

= ^(R^w + ^do-P-Y^iRl)-^ 2 , 

where f3 = h T (d 0 + V m y) and h = V£h, and solve the minimization problem 

min||g(/3)||. ( 4 - 31 ) 

Let /3, be a solution to (4.31). By the theory established in [2], (4.29) is solved by 

vt = [AfAT'hiP*)/* + AT 1 ™ - \Ar l A 

Then the tensor step for the system of nonlinear equations is given by 

d t = d 0 + V m yl (4-32) 

We examine the cost of solving the tensor model when d 0 ^ 0. Since the situation of 
7 = 0 is less expensive to deal with, we will concentrate on the situation of 7 ^ 0. Compared 
to the solution of the Newton equation by the GMRES algorithm in a similar situation, the 
solution of the tensor model requires a minimal amount of extra work. The major extra 
woTk comes from forming d\ J, J T a and J T F c , each requiring (m + 1)N multiplications 
(note that J T a = J T Pa = J T a from the definition of P ). Since V£ +1 g = V m+1 (-F c - r 0 ) = 
_V £ +1 F c - ||r 0 ||ei, given Uj + 1 F C , cost of V% +1 9 is onl y a sin S le subtraction. Hence the 

cost of factorization of j T J, which involves the calculation of Qm^m+iSi 9 7 9 an ^ w T w, 
is JV + 5m multiplications. The operation count is accumulated from an application of 
m Givens rotations costing 4m multiplications, a dot-product of 2 IV-vectors costing ^/V 
multiplications and a dot-product of 2 m-vectors costing m multiplications. Given s, J a 
and J T F c , the major cost of forming q{(3) and u defined in (4.23) and (4.24) respectively, 
comes from the calculation of s T (J T J)~* . Using the available factorization of J J , this can 
be done by two backsolves of (m + 1) X (m + 1) triangular systems, which costs (m + l) 2 
multiplications. After s T (J T /)- 1 is obtained, the cost of forming q{fi) and w is three dot- 
products of two m-vectors costing 3m multiplications totally. The cost of obtaining y r using 
(4 22) needs two extra backsolves of (m-f l)x(m+l) triangular systems, which costs (m+1) 
multiplications, given s T (J T J)~\ J T a and J T F c . In summary, the total extra work required 
by solving the tensor model when the d ^ 0 is at most (4(m + 1) + 1)JV + 2(m + l) 2 + 8m 
multiplications. 

4.6 Preconditioning and matrix free implementation 

The success of the GMRES method on a system of linear equations usually depends on 
a good preconditioner. The formation and solution of the tensor model is consistent with 
preconditioning. When a preconditioner M is used in solving the Newton equation by the 
GMRES algorithm, it turned out that the only thing we need to do in the tensor step 
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calculation is to replace a by M~ x a, and F c by M~ X F C (in case when d 0 ± 0). The rest of 
the solution procedure is unchanged. 

Compared to the Newton-GMRES method, the only extra computation involving the 
Jacobian matrix in the tensor-GMRES method is the computation of Js in the formation 
of the tensor term. In a Jacobian free implementation, this matrix-vector product can be 
approximated by the finite difference formula specified by (2.5). Hence the tensor-GMRES 
scheme is consistent with matrix free implementation. 

4.7 Work comparison 

If m steps of GMRES is required for solving the Newton equation, the computational cost 
of the Newton-GMRES iteration is m(m + 2 )N + mNZ multiplications, and the storage 
requirement is (m + 2)jV [24]. The extra storage required by the tensor-GMRES method is 2 
N- vectors. As analyzed at the end of Subsection 4.3, the extra computational cost of forming 
the tensor model is N + NZ. As analyzed in Subsection 4.5, in the most expensive case the 
extra computational cost of solving the tensor model is (4(m + 1) + 1 )N + 2(m + l) 2 + 8m 
multiplications. Hence the combined extra computational cost of formation and solution 
of the tensor model in each tensor iteration, compared to an iteration of Newton-GMRES 
algorithm, is at most (4(m + 1) + 2 )N + 2(m + l) 2 + 8 m + NZ multiplications. If we 
count the operations in flops (counting both multiplications and additions), the total extra 
computational operations are at about twice as many. Compared to m(m + 2 ) A + mNZ , 
the cost of solving the Newton equation using the GMRES method, this extra cost is not 
significant if m is relatively large. 

Because of the memory limitation, it is likely that m is not too large. Hence it is 
necessary to restart the GMRES algorithm. However, we should point out that the tensor 
model is not formed until the Newton equation is approximately solved by the GMRES 
algorithm, or in other words, until the Krylov space that contains the solution to the 
Newton equation is found. We form the tensor model only using the Krylov subspace 
generated in the last restarted GMRES algorithm. The tensor model has nothing to do 
with the intermediate Krylov spaces generated by the GMRES algorithm resulted from 
restarts before the final restart. Therefore compared to the total cost of the Newton- 
GMRES with restarts, the extra cost of formation and solution of the tensor model is likely 
to be minimal for a large portion of nonlinear problems, particularly hard problems that 
need many restarts of the GMRES algorithm. 

5 Implementation and testing 

In the previous section we presented the main new features of our tensor-GMRES method 
for nonlinear equations, namely, how we form the quadratic model of the nonlinear function, 
and how we solve this model efficiently. In this section first we give the complete algorithm 
we have implemented to test these ideas and clarify various aspects of this algorithm and its 
computer implementation. Then, we present some results of testing the method on several 
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problems. 

This section is organized as follows. Subsection 5.1 gives a complete tensor-GMRES al- 
gorithm for systems of nonlinear equations and discusses the implementation of each step in 
details. In Subsections 5.2-5.4 we will show comparative test results for the tensor-GMRES 
algorithm, Algorithm TG, given in Subsection 5.1 versus the Newton-GMRES algorithm, 
Algorithm NG, given in Section 2 with the same implementation. Three distinct test prob- 
lems., i.e., a Bradu problem, the Broyden Tridiagonal problem and the one-dimensional 
Euler equations problem, and several of their variants, were used in the testing. The tests 
on the Bradu problem, the Broyden Tridiagonal problem and their variants were performed 
on a Sun Super Workstation I1+/50 at RIACS using MATLAB. The test on the one- 
dimensional Euler equations problem was performed on a Cray Y-MP at NAS facility using 
Fortran 90. Test results are summarized and discussed in Subsection 5.5. 

5.1 A complete algorithm 

This subsection presents Algorithm TG, the full algorithm of the tensor-GMRES method 
and discusses implementation issues of this algorithm in details. 

Algorithm TG. An iteration of the tensor-GMRES method. 

Given Xk £ %l N , Xk- 1 , Jk £ Fk £ and Fk-\ £ • 

(TG-1) Decide whether to stop. If not: 

(TG-2) Set s = x k - 1 - **, a = 2 (F k -i - F k - Jks)/(s T s) and h = «/IMI- Choose 

fit G [0,1). 

(TG-3) Do GMRES (restart if necessary) to find d n = do + Pn»y n such that 

|| F k + J k d n || < € fc , 

where do is the starting point of the last restarted GMRES procedure, and the 
columns of V m form an orthonormal basis for the Krylov space generated by the 
corresponding Arnoldi process. In addition, let H m be the Hessenberg matrix 
generated from the Arnoldi process, and H m = Q m Rm t> e its QR- factorization. 

Let R ^ be the first m rows of R m . 

(TG-4) If d 0 = 0 or d 0 £ {V m } then 

Solve 

min \\Fk + Jkdo + Jk^mV F ^{^(do F V m y)} 2 \\, (5-1) 

where a = (J k V m ){(JkV m ) T { JkV m )}-\JkV m ) T a, by first solving 
hi(R]„)~ l w + h T do — (d — \hJ(R m ) b(3 )||, 
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to obtain a solution /?*, where w is the first m components of Qm\\Fk + 

Jkdo\\eu b ls m components of QmVm+i a > an d h\ — V^h. 

Then the solution to (5.1) is given by 

where w = hJUR^fiR^^h i. 

Form the tensor step d t = do + V m yl. 

Otherwise (d 0 ^ 0 and do £ {V™}), 

Solve 

mm +i 11^ + Jy + \a{hly)\ (5.2) 

where j = [V^+i 

a = («7[V^, d 0 ]){(^[^„, do])}-^ ^[V^, rfo]) 7 ’® 

and h 2 = [F m ,c!o] T ^- This is done by first solving 

mjn M)(= hl{J T j)~ l j T F c + /3 + \hl{j T j)~ l j T a(3 2 ) || 
with solution /3*. Then the solution to (5.2) is given by 

y . = (J T J)- X h 2 q 2 {^)lu - (J T J)-'J T (F c + 1 aPl), 

where u> = h^J 7 J)~ l h 2 - 

Form the tensor step d l — [V m ,do]y*- 

(TG-5) Choose a new step d from d n and d*. 

(TG-6) Find A > 0 using a backtracking line search global strategy and form the next 
iterate xjt+i = Xk + A d. 

Several tests are performed to determine whether to stop the algorithm in Step (TG-1). 
These stopping criteria are described by Dennis and Schnabel in Chapter 7 of [11]. For the 
sake of simplicity, we only use simplified versions of their criteria. The first test determines 
whether Xk solves the problem (1.1). This is accomplished by using ||F(xjt)|| < FTOL, where 
a typical value of FTOL is around 10“ 5 , but the users can specify their own values for this 
tolerance. In our tests this value was set to 10 -12 since we wanted to push the algorithms 
to their limits. The second test determines whether the algorithm has converged or stalled 
at Xfe. It is done by measuring the relative change in the iterates from one step to the next. 
We use ||a?jt - £fc-i||/||zfc-i|| < STPTOL, where a typical STPTOL is around 10 -8 in our 
implementation. Finally, we test if a maximum number of iterations is exceeded. Currently 
this value is 150. 
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In Step (TG-2), we need to choose the tolerance c fc , which is passed to the GMRES 
algorithm when it is called to solve the Newton equation at the fcth iteration. In [5], Brown 
and Saad suggested a sequence ek = T]k\\F(xk)\\, where r/k = (I)* for k — 1,2, •••. Since 
a good sequence is normally problem related, and again for the sake of simplicity, we use 
unchanging c* = 10~ 8 at every iteration for our test problems. 

Step (TG-3) calls the normal GMRES linear equations solver. The number of Arnoldi 
iterations allowed between restarts is usually set to 20, and the number of maximum restarts 
of GMRES allowed is usually set to 150. However, these two values can be provided by 
the users based on their experience. When the tolerance is not reached after a maximum 
number of restarts, we simply go ahead and use the last computed data. When it returns, 
the GMRES algorithm readily provides R ^ and Q m that is presented in m Givens rotations. 
One by-product of Step (TG-3) is the Newton-GMRES step. 

Step (TG-4) calculates the tensor-GMRES step. It is basically a concised reiterate of 
the solution of the tensor model described in the previous section. In Algorithm TG, the 
solutions of the two situations when do = 0 an d do € {Em}> are combined for succinctness. 
The minimization of a quadratic function in one variable is done by using standard root 
formula. When a quadratic function has two distinct roots, the root that is smaller in 
absolute value is chosen. 

Step (TG-5) usually consists of choosing the tensor step direction d* obtained in Step 
(TG-4). However, the Newton step direction is chosen instead, when the tensor step di- 
rection is not a descent direction for |||F(a:)|| 2 , which rarely occurs in practice but is not 
preluded in theory. Since the gradient of ^||F(a:)|| 2 is J(x) T F(x), d 4 is a descent direction 
if (d*) 7 J(x) T F(x) < 0. We discuss how to compute this expression efficiently. At cur- 
rent iterate x c , on the one hand, when do = using d* = V m y , JcVm — + i II m and 

F c = -||F c ||t>i yields 

Jj F c 


The cost of calculating (5.3) is 
[Em* do]?/*, JcVm = Em+i and 


= (I f m yyj l c F c 
= (y'fiJcVmfFc 
= (y t ) T (V m+ 1 £ m ) T F c 
= (£ m2 /) T (v£ +1 F c ) 

= — (iW) T ||F c ||ei. (5-3) 

minimal. On the other hand, when do ^ 0, using d 1 = 
r 0 = -F c - J c d 0 yields 

(d*) T jJ F c 

([VrmdotffjjFc 

(y'fiJcVmiJcdofFc 

(y‘) T [y m+ 1 F m ,-r 0 -F c ] T F c 

(~t\T ( Hm V £+-iF c \ (5.4) 

(V > -rtF c - IIFJ 2 ' ' 
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The major work in (5.4) is the calculation of V% +1 F C . However, since this calculation is 
already done in the solution of the tensor step, no extra cost is necessary. 

Finally, in Step (TG-6), we use a standard quadratic backtracking line search algorithm 
which is given below as Algorithm QB. The merit function i||F(z)|| 2 is used for measuring 
the progress towards the solution. 

Algorithm QB Standard quadratic backtracking line search algorithm. 

Given a current point x c , a search direction d, the directional derivative £, and a = 10 -4 . 

(QB-1) Set f c = i||F(x c )|| 2 and x+ = x c + d. 

(QB-2) Set f + = ±||F(x + )|| 2 and A = 1. 

(QB-3) While /+ > f c + a • A • £ do 

Atemp = ~£/(2[/+ ~ fc~ ^]); 

A — A/10) , 

X — X c 

/ + = I||F(z + )|| 2 

EndWhile 

When d n is chosen in Step (TG-5), the directional derivative is given by (d n ) T Jj F c . Its 
calculation can be accomplished in a fashion similar to when d 1 is chosen. When do = 0, 
we can simply replace y l in (5.3) by y n and calculate -(^ m y n )||F c ||ei. When d 0 ^ 0 

(d n ) T jJ F c 

= (d 0 + V m y n ) T JjF c 

= (J c d 0 + JcV m y n ) T F c 

= (-F c -r 0 + V m+1 H m y n ) T F c 

= -\\F c \\ 2 -rTF c + (y n ) T HZ(v£ +l F c ), 

which is easy to calculate given V^ +l F c . 

5.2 Test results for a Bradu problem and its variants 

As a first test problem we choose to solve the nonlinear partial differential equation 

-Am = Ae u in fl, u = 0 on T, (5.5) 

where A = V 2 = the Laplace operator, = (0,1) X (0,1) and T is the 

boundary of ft. This version of Bradu problem is chosen from a set of nonlinear model 
problems collected by More [18]. 


24 




Figure 1: Results for the Bratu problem, A = 6.5. Diagonal preconditioning. Solid line: 
tensor-GMRES; Dotted-line: Newton-GMRES. 

With n a positive integer we define h = 1 /(n + 1) and then a mesh is given by 

M tJ = {ih,jh}, 0 <i,j<n. 

To approximate problem (5.5) we use the following finite-difference scheme: 

Ui+ij ± ~ = 1 <i,j<n (5.6) 

UM = 0, if Mki G T. 

In (5.6), Uij is an approximation to u(M,y). For A < 0, (5.5) has a unique solution. For 
A > 0, (5.5) may have one, several, or no solutions. In this test, we took n = 32 and A = 6.5, 
which yields a system of N = 1024 equations in N unknowns. We tested the tensor-GMRES 
method given by Algorithm TG and the Newton-GMRES method given by Algorithm NG 
described in Section 2. In both algorithms, the standard diagonal preconditioning was used. 
The number of Arnoldi iterations allowed between restarts was set to 20 when the GMRES 
linear solver was called. The initial guess was chosen as uo = 0. 

Figure 1 shows that the tensor-GMRES method has a slight improvement over the 
Newton-GMRES counterpart in number of nonlinear iterations. Although this problem is 
quite easy to solve for both methods and both methods exhibited quadratic convergence, 
the tensor-GMRES was converging a little bit faster. It took the tensor-GMRES method 5 
iterations while it took the Newton-GMRES method 6 iterations to reach almost the same 
level of accuracy. 

Since it is difficult to find large singular systems of nonlinear equations in the literature, 
we constructed all the singular test problems on our own. We give the test results for a 
rank one deficient modification of the Bradu problem. This singular problem is constructed 
by squaring the last equation in (5.6). The resulting problem has exactly the same solution 
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Iterations 


Figure 2: Results for the rank one deficient modification of the Bratu problem, A = -5. 
Diagonal preconditioning. Solid line: tensor-GMRES; Dotted-line: Newton-GMRES. 

as the original problem. At the solution, the last row of the Jacobian matrix is a zero row. 
The rest of the rows are unchanged. Since the Jacobian matrix of the original problem is 
nonsingular, the new Jacobian matrix has rank deficiency one. In this test, we took n = 32 
and A = -5, which gives a system of N — 1024 equations in N unknowns. We tested the 
tensor-GMRES method given by Algorithm TG and the Newton-GMRES method given by 
Algorithm NG with the same implementation. In both algorithms, the standard diagonal 
preconditioning was used. Again when calling the GMRES routine we limited the number of 
Amoldi iterations between restarts to 20. The initial guess was chosen as u 0 = [1, 1, • • -, 1] . 

Figure 2 shows that the tensor-GMRES method has a significant improvement over 
the Newton-GMRES counterpart in number of nonlinear iterations. The Newton-GMRES 
method is exhibiting linear convergence, while the tensor-GMRES method shows superlinear 
convergence. To reach the same accuracy, it took the tensor-GMRES method 7 iterations 
while it took the Newton-GMRES method 21 iterations. The margin of improvement in 
number of nonlinear iterations is more than 65%. One may notice the linear convergence 
behavior of the tensor-GMRES method at the last iteration. We suspect that this is caused 
by the round-off error. 

Next we give the test results for a rank two deficient modification of the Bradu problem. 
Since the formation of the tensor model uses only one past point, one should expect the 
tensor method to help significantly with problems where the Jacobian matrix has rank 
deficiency one. It would be interesting to see whether the tensor method can help with 
problems where the Jacobian matrix has rank deficiency greater than one. This is the 
motivation for the rank two deficiency modification of the Bradu problem. This singular 
problem is constructed by squaring the last two equations in (5.6). Again the resulting 
problem has exactly the same solution as the original problem. At the solution, the last 
two rows of the Jacobian matrix are zero rows. The rest of the rows are unchanged. Since 
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Figure 3: Results for the rank two deficient modification of the Bratu problem, A - -5. 
Diagonal preconditioning. Solid line: tensor-GMRES; Dotted-line: Newton-GMRES. 

the Jacobian matrix of the original problem is nonsingular, the new Jacobian matrix has 
rank deficiency two. In this test, we took n = 32 and A = -5, which yields a system 
of TV — 1024 equations in N unknowns. We tested the tensor-GMRES method given by 
Algorithm TG and the Newton-GMRES method given by Algorithm NG under the same 
implementation. In both algorithms, the standard diagonal preconditioning was used. We 
limited the number of Arnoldi iterations between restarts to 20. The initial guess was chosen 
as uq — [l, 

Figure 3 shows that the tensor-GMRES method again has a significant improvement 
over the Newton-GMRES counterpart in number of nonlinear iterations. The Newton- 
GMRES method is exhibiting linear convergence similar to the rank one deficient situation, 
while the tensor-GMRES method show superlinear convergence. The convergence pattern 
of the tensor-GMRES method is slightly different from the rank one deficient situation. The 
convergence here seems to be two-step superlinear, while in the rank one deficient situation 
the convergence seems to be one-step superlinear. It took the tensor-GMRES method 6 
iterations while it took the Newton-GMRES method 21 iterations to reach almost the same 
level of accuracy. The margin of improvement is over 70% in number of nonlinear iterations. 

5.3 Test results for the Broyden Tridiagonal problem and its variants 

The Broyden Tridiagonal problem is chosen from a standard test set of More, Garbow and 
Hillstrom [19]. The function is defined as 

/i(x) = (3-2x<)®«-a:»-i-2xi + i + l, for t = 1,- • -,n (5-7) 

where xo = x„+i = 0 and n can be any positive integer. A root of / = 0 is sought. For 
our test, we set n = 1000 which results in a system of 1000 nonlinear equations in 1000 


27 




unknowns. The Jacobian matrix has fuD rank at the solution. Three tests were performed 
on this problem. The standard starting point is Xo — [— 1, — 1, • • • , — 1]. When calling the 
GMRES routine we limited the number of Arnoldi iterations between restarts to 20. 



Figure 4: Results for the Broyden Tridiagonal problem. x 0 = [— 1, -1, • • •, — 1] T . Diagonal 
preconditioning. Solid line: tensor-GMRES; Dotted-line: Newton-GMRES. 

Figure 4 shows that the tensor-GMRES algorithm and the Newton-GMRES algorithm 
performed about the same, both exhibiting a quadratic convergence. Before the last step, 
the tensor-GMRES method was doing a little bit better, knocking down 1 or 2 more digits 
in function norm than the Newton-GMRES method. The last step of the tensor-GMRES 
broke the trend of the convergence. We believe that this is caused by the round off errors. 



Figure 5: Results for the Broyden Tridiagonal problem. x 0 = 10 * [-1, - 1, • • • , - 1] T . Diag- 
onal preconditioning. Solid line: tensor-GMRES; Dotted-line: Newton-GMRES. 
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Figure 6: Results for the Broyden Tridiagonal problem, x 0 = 100* [-1, -1, • • •, -1] T - 
Diagonal preconditioning. Solid line: tensor-GMRES; Dotted-line: Newton-GMRES. 

Since starting from the standard starting point x 0 seems too easy for both algorithms, 
we tried to start farther away from x 0 . 

Figure 5 shows the test results of starting from 10 * x 0 . We see that the tensor-GMRES 
method has a moderate improvement over the Newton-GMRES method in number of non- 
linear iterations. To reach a similar accuracy, it took the tensor-GMRES method 6 iterations 
while taking the Newton-GMRES method 8 iterations. 

Figure 6 shows the test results of starting from even farther with 100 * x 0 . This time we 
see that the tensor-GMRES method has a significant improvement over the tensor-GMRES 
method in number of nonlinear iterations. To reach a similar accuracy, it took the tensor- 
GMRES method 6 iterations while taking the Newton-GMRES method 12 iterations. The 
margin of improvement is 50%. It seems that for this problem the tensor-GMRES method 
is not sensitive to scaling up the starting point as the Newton-GMRES does. 

Next we give the test results for a rank one deficient modification of the Broyden Tridi- 
agonal problem. Again the problem was constructed by squaring the last function defined 
by (5.7). As discussed before this construction does not alter the solutions to the original 
system and results in a system whose Jacobian matrix has rank deficiency one at the so- 
lution. In this test, we took n = 1000, z 0 = [-1, -1, • • •, -1]- When calling the GMRES 
routine we limited the number of Arnoldi iterations between restarts to 20. 

Figure 7 shows that the tensor-GMRES method has a significant improvement over the 
Newton-GMRES method in number of nonlinear iterations. The Newton-GMRES method 
is exhibiting linear convergence, while the tensor-GMRES method shows superlinear con- 
vergence. To reach the same accuracy, it took the tensor-GMRES method 11 iterations 
while it took the Newton-GMRES method 22 iterations. The margin of improvement is 

5°%. D , 

Finally, we give the test results for a rank two deficient modification of the Broyden 
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Tridiagonal problem. The problem was constructed by squaring the last two functions 
defined by (5.7). As discussed before this construction does not alter the solutions to the 
original system and results in a system whose Jacobian matrix has rank deficiency two at 
the solution. In this test, we took n = 1000, xo = [— 1, — 1, • • •, —1]. When calling the 
GMRES routine we limited the number of Arnoldi iterations between restarts to 20. 



Figure 7: Results for a rank one deficient modification of the Broyden Tridiagonal problem, 
xo = [— 1, — 1, • • — 1] T . Diagonal preconditioning. Solid line: tensor-GMRES; Dotted-line: 

Newton-GMRES. 



Figure 8: Results for a rank two deficient modification of the Broyden Tridiagonal problem. 
Xo — [ — 1, — 1,***, — 1]^. Diagonal preconditioning. Solid line: tensor-GMRES; Dotted-line: 
Newton-GMRES. 

Figure 8 shows that the tensor-GMRES method again has a significant improvement over 
the Newton-GMRES counterpart in number of nonlinear iterations. The Newton-GMRES 
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method is exhibiting linear convergence similar to the rank one deficient situation, while 
the tensor-GMRES method show superlinear convergence. The convergence pattern of the 
tensor-GMRES method again is slightly different from the rank one deficient situation. The 
convergence here seems to be two-step superlinear, while in the rank one deficient situation 
the convergence seems to be one-step superlinear. It took the tensor-GMRES method 11 
iterations while it took the Newton-GMRES method 22 iterations to reach almost the same 
level of accuracy. The margin of improvement is 50% in number of nonlinear iterations. 

5.4 Test results for one-dimensional Euler equations 

One of the target applications for the tensor-Krylov methods is the nonlinear differential 
systems arising in physical problems, e.g. aerodynamics. One good model problem is 
the quasi-one-dimensional (ID) Euler equations for flow through a nozzle with a given area 
ratio. In particular, transonic conditions which generate a shock within the nozzle present a 
difficult test case, where methods typical of practical aerodynamic applications are required. 
Such methods include, finite difference, finite element, and unstructured grid finite volume 
techniques employing various forms of highly nonlinear algorithm constructions. For our 
purposes here, we have chosen one popular form of central finite differences with nonlinear 
artificial dissipation, see [21] for general details. 

The quasi- ID Euler equations are 

F(Q) - <9 X E(Q) - H(Q) = 0 0.0 < x < 1.0 (5.8) 

where 


* p ' 


pu 


0 

pv 

H 

3 

II 

H 

pu 2 + p 

, H = 

-pd x a(x) 

e 


,u(e + p). 


0 


with p (density), u (velocity), e (energy), p = (7 - l)(e - O.bpu 2 ) (pressure), 7 = 1.4 (ratio 
of specific heats), and a(x) = (1. — 4.(1 — at)x(l — x)) (the nozzle area ratio), with at = 0.8 . 
For a given area ratio and shock location (here x = 0.7) an exact solution can be obtained 
from the method of characteristics. 

We elect to use second order central differences 

d x u « 6 x uj = Uj+ \~ Uj - ~ j = 0, . . J N Ax = 1.0/ Jn, = u(jAx) (5.10) 

J 2Ax 

It is common practice and well known that artificial dissipation must be added to the 
discrete central difference approximations in the absence of any other dissipative mechanism, 
especially for transonic flows. Nonlinear dissipation as defined in [22], is used where 2 nd 
order, D 2 ( Q), and 4 th order, D 4 (Q) difference formulas are employed. 

D 2 (Q ) = V x (<7j +1 + (Tj) (fj 2) A x Qj) 

D 4 ( Q) = -V*(<r j+ i + a;) (c^A^A^Q,) 
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(5.11a) 

(5.11b) 



with 


V^j j = qj-qj-i, = qj+\ - qj (5.11c) 

cf 1 = «2 max(Tj+i, Tj, Tj_i) 

T _ \Pj±i ~ 2 P: + Pj-i 1 
3 \Pj+i + 2pj+Pj-i| 

= majc(0, «4 - €j 2 ^) (5. lid) 


where typical values of the constants are k 2 — 1/4 and K 4 — 1/100. The term Oj |v| 4- c 
(where c = VXtp/p) fhe speed of sound) is a spectral radius scaling. 



Figure 9: Results for ID Euler using full nonlinear dissipation, Solid Line: tensor-GMRES; 
Dotted-line: Newton-GMRES. 

Boundary operators at j = 0 and j — Jn are defined in terms of physical conditions 
(taken from exact solution values) and the use of Riemann invariants. For this problem both 
inflow and outflow boundaries are subsonic and locally one-dimensional Riemann invariants 
are used. The locally one- dimensional Riemann invariants are given in terms of the velocity 
component as 

«! = u- 2 c/( 7 - 1) and R 2 = u + 2 c /(7 - 1). (5.12) 

The Riemann invariants R\, R 2 are associated with the two characteristic velocities X\ = 
u — c and A 2 = u + c respectively. One other equation is needed so that the three flow 
variables can be calculated. We choose 5 = In (p/p y ) where S is entropy. For subsonic 
inflow u < c characteristic velocity A 2 > 0 carrys information into the domain and therefore 
the characteristic variable R 2 can be specified along with one other condition. The Riemann 
invariant R 2 , and S are set to exact values. The other characteristic velocity Ai < 0 carries 
information outside the domain and therefore, Ri is extrapolated from the interior flow 
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variables. On subsonic outflow u < c and A 2 > 0 carries information outside the domain, 
while Ai < 0 propagates into the domain, so only R\ is fixed to exact values and /Z2? an d 
ln(5) are extrapolated. Once these three variables are available at the boundary the three 
flow variables Q can be obtained. If we consider the boundary procedure as an operator on 
the interior data, we can cast the boundary scheme as 

B( Q) t = Q t -B(Q t+1 ) = 0 i = 0 

and 

B(Q)i = Qt - B(Qt-i) = 0 i = J N , 

which are nonlinear equations at the boundaries. 



Figure 10: Results for ID Euler using unlimited dissipation, Solid Line: tensor-GMRES; 
Dotted-line: Newton-GMRES. 

The total system we shall solve is 

T(n\ — / WQ), " H(Q), + D](Q) + D*(Q), j = 

1 5(Q), = 0, i = 0,J N 

The Jacobian matrix for (5.13) is obtained in two ways. An approximated Jacobian is 
formed analytically except, where due to the non-differential form of the e s, the nonlinear 
coefficients for the artificial dissipations, D 2 and D 4 are frozen at the linearized state, 
i.e., they are not linearized. In another form, the Jacobian is obtained through a Frechet 
derivative, where error tolerances are appropriately chosen. The results presented below 
are basicly independent of the choice of Jacobian linearization. The order of the system is 
N = ( J N + 1) x 3. A key element of the success of the solution using the Krylov subspace 
methods is the choice of preconditioning. This issue for systems such as (5.13), which are 
not diagonally dominate, is not straightforward and is still the subject of active research. 
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We shall not go into the details of the preconditioner here, and only state that the same 
preconditioner is used for both the Algorithm NG and TG so that consistent comparisons 
can be made. 



(c) (d) 


Figure 11: Pressure distribution: Solid Line: Computed; Dotted Line: Exact, (a) and (b) 
Full Nonlinear Dissipation; (c) and (d) Unlimited Dissipation. 

Figure 9 shows Algorithm NG and Algorithm TG applied to (5.13) for Jn = 200; 
N = 603. In the case of Algorithm NG the convergence appears linear taking approximately 
125 steps to converge, while Algorithm TG shows about a factor of 6 decrease in the 
number of nonlinear iterations and appears to be at least fast linear. To date, our analysis 
indicates that the system derived form (5.13) is nonsingular and so we do not consider this 
an example similar to the singular ones presented above. But, we have demonstrated, at 
least numerically at this time, that the non differential nature of the nonlinear dissipation 
coefficients in (5.1 la-5.1 Id) is the source of the linear behavior observed. Figure 10 shows 
the convergence results with k 2 = 0, resulting in a quadratic-like convergence from both 
Algorithm NG and TG. Nonlinear switching, such as defined in (5.11a-5.11d), is typical of 
current numerical algorithms for the Euler and Navier-Stokes equations. They may take a 
similar form to (5.1 la-5, lid), see [22] or be in the form of limiters for upwind techniques, e.g. 
[28], [16], The nonlinear switching (limiting) is necessary to eliminate overshoots at shocks, 
where higher order schemes are limited to lower order which more correctly differences the 


34 




equations across discontinuities. Figure 11, shows the results with and without limiting, 
notice the overshoots across the discontinuity which are more pronounced for the unlimited 
case. In general, some form of limiting will be required and in these cases Algorithm TG 
appears to be capable of at least fast linear convergence in contrast to Algorithm NG’s slow 
linear convergence. 

5.5 Summary and discussion of test results 

The test results of this section indicate that the tensor-GMRES algorithm that we tested 
is more efficient in number of nonlinear iterations than the analogous Newton-GMR.ES 
method on singular and nonsingular problems, and significantly more efficient on problems 
where the Jacobian matrix has a small rank deficiency at the solution or the function has 
discontinuity. Since no failure was detected for either of the methods in the tests, it is 
inconclusive which method would be more robust. We observed that for each nonlinear 
iteration, the number of restarts of the GMRES algorithm for solving the Newton equation 
was ranging from 3 to 20 (recall that for all test problems the number of Arnoldi iterations 
allowed in the GMRES algorithm was set to 20). As discussed in Section 4, in the tensor 
algorithm the computation of the tensor step only uses the Krylov subspace that is produced 
by the last restarted GMRES algorithm. Statistically, the size of the last Krylov subspace 
that is used in the tensor model formation and solution is likely to be half of the number 
of Arnoldi iterations allowed in the GMRES algorithm (for example 20/2 — 10 for our 
tests). As analyzed in Subsection 4.7, the extra computational cost of each tensor iteration 
is only a fraction of the cost of the last restarted GMRES algorithm. Hence for our test 
problems (some of them are easy to solve), statistically, the extra cost of one tensor-GMRES 
iteration would be ranging from 2.6% (when 20 restarts of GMRES were needed) to 20% 
(when 3 restarts of GMRES were needed), compared to the cost of one Newton-GMRES 
iteration. In addition, from our experience the tensor-GMRES did not generate iterates 
where the Newton equations would be harder to solve. Hence the savings of the tensor- 
GMRES algorithm in number of nonlinear iterations can be roughly translated into savings 
in overall computational costs, especially for problems where more restarts of the GMRES 
algorithm were needed to solve the Newton equations. In general, at each iteration, the 
more restarts of the GMRES algorithm that are required for solving the Newton equation, 
the lower the extra cost for a tensor-GMRES step. For real world problems, it is likely that 
at each nonlinear iteration, the solution of the Newton equation would require a significant 
number of restarts of the GMRES algorithm. Hence the extra cost of the tensor-GMRES 
method is likely to be minimal in practice. 

6 Summary and topics for future research 

This paper has introduced the tensor-GMRES method for systems of nonlinear equations. 
This method has similar requirement for storage and arithmetic per iteration to the Newton- 
GMRES method. This method is also consistent with preconditioning and matrix free 
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implementation. An implementation of full nonlinear algorithm using the tensor-GMRES 
method has shown to be more efficient on both nonsingular and singular problems than 
analogous implementation of the Newton-GMRES method. The efficiency advantage of 
the tensor-GMRES method is particularly large on problems where the Newton-GMRES 
method exhibits linear convergence (due to singularity or discontinuity). 

Based on these results, it would appear worthwhile to continue research on tensor-Krylov 
methods on nonlinear equations. The two main topics for future research would appear to be 
practical implementation and farther testing of the tensor-GMRES methods for nonlinear 
equations, and new tensor-Krylov methods for nonlinear equations. We discuss each of 
these briefly. 

As seen in Section 5, our implementation is still in early stage. Several directions 
can be pursued immediately to improve the current implementation: (1) Scaling in both 
the variable space and the function space can be added; (2) Matrix free implementation 
of the tensor-GMRES method, which can be achieved in a fashion similar to analogous 
implementation of the Newton-GMRES method, can be pursued; (3) More sophisticated 
stopping criteria in the nonlinear algorithm can be included; (4) More global convergence 
strategies such as model trust region techniques can be integrated. We would like to continue 
our testing of the tensor-GMRES method on more practical problems. One interesting task 
is to test the tensor-GMRES method on the ARC2D code [21] that is the two dimension 
version of the ARC1D code that we tested in Section 5. 

Secondly, new tensor-Krylov methods can be developed. An immediate direction that 
one can pursue is a tensor- Arnoldi method since the Arnoldi’s method for linear systems is 
closely related to the GMRES method for linear systems. A less straightforward direction 
that can be pursued in the future is to combine tensor methods with Krylov methods 
that use two mutually orthogonal sequences such as BiCG and QMR. We are currently 
investigating this possibility. 
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